Overview

Brought to you by YData

Dataset statistics

Number of variables25
Number of observations10226
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.0 MiB
Average record size in memory200.0 B

Variable types

Categorical14
Text11

Reproduction

Analysis started2024-11-05 02:03:07.538188
Analysis finished2024-11-05 02:03:16.162053
Duration8.62 seconds
Software versionydata-profiling vv4.12.0
Download configurationconfig.json

Variables

Distinct11
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
25-29
2523 
30-34
2063 
35-39
1420 
22-24
1306 
40-44
969 
Other values (6)
1945 

Length

Max length5
Median length5
Mean length4.9906122
Min length3

Characters and Unicode

Total characters51034
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row22-24
2nd row40-44
3rd row22-24
4th row50-54
5th row22-24

Common Values

ValueCountFrequency (%)
25-29 2523
24.7%
30-34 2063
20.2%
35-39 1420
13.9%
22-24 1306
12.8%
40-44 969
 
9.5%
45-49 642
 
6.3%
50-54 464
 
4.5%
18-21 317
 
3.1%
55-59 264
 
2.6%
60-69 210
 
2.1%

Length

2024-11-05T10:03:16.534825image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
25-29 2523
24.7%
30-34 2063
20.2%
35-39 1420
13.9%
22-24 1306
12.8%
40-44 969
 
9.5%
45-49 642
 
6.3%
50-54 464
 
4.5%
18-21 317
 
3.1%
55-59 264
 
2.6%
60-69 210
 
2.1%

Most occurring characters

ValueCountFrequency (%)
- 10178
19.9%
2 9281
18.2%
4 8024
15.7%
3 6966
13.6%
5 6305
12.4%
9 5059
9.9%
0 3754
 
7.4%
1 634
 
1.2%
6 420
 
0.8%
8 317
 
0.6%
Other values (2) 96
 
0.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 51034
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
- 10178
19.9%
2 9281
18.2%
4 8024
15.7%
3 6966
13.6%
5 6305
12.4%
9 5059
9.9%
0 3754
 
7.4%
1 634
 
1.2%
6 420
 
0.8%
8 317
 
0.6%
Other values (2) 96
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 51034
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
- 10178
19.9%
2 9281
18.2%
4 8024
15.7%
3 6966
13.6%
5 6305
12.4%
9 5059
9.9%
0 3754
 
7.4%
1 634
 
1.2%
6 420
 
0.8%
8 317
 
0.6%
Other values (2) 96
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 51034
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
- 10178
19.9%
2 9281
18.2%
4 8024
15.7%
3 6966
13.6%
5 6305
12.4%
9 5059
9.9%
0 3754
 
7.4%
1 634
 
1.2%
6 420
 
0.8%
8 317
 
0.6%
Other values (2) 96
 
0.2%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
Male
8799 
Female
1427 

Length

Max length6
Median length4
Mean length4.2790925
Min length4

Characters and Unicode

Total characters43758
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowMale
4th rowMale
5th rowMale

Common Values

ValueCountFrequency (%)
Male 8799
86.0%
Female 1427
 
14.0%

Length

2024-11-05T10:03:17.013142image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T10:03:17.537377image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
male 8799
86.0%
female 1427
 
14.0%

Most occurring characters

ValueCountFrequency (%)
e 11653
26.6%
a 10226
23.4%
l 10226
23.4%
M 8799
20.1%
F 1427
 
3.3%
m 1427
 
3.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 43758
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 11653
26.6%
a 10226
23.4%
l 10226
23.4%
M 8799
20.1%
F 1427
 
3.3%
m 1427
 
3.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 43758
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 11653
26.6%
a 10226
23.4%
l 10226
23.4%
M 8799
20.1%
F 1427
 
3.3%
m 1427
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 43758
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 11653
26.6%
a 10226
23.4%
l 10226
23.4%
M 8799
20.1%
F 1427
 
3.3%
m 1427
 
3.3%
Distinct59
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
2024-11-05T10:03:18.269420image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length52
Median length28
Mean length10.89556
Min length4

Characters and Unicode

Total characters111418
Distinct characters49
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFrance
2nd rowAustralia
3rd rowIndia
4th rowFrance
5th rowIndia
ValueCountFrequency (%)
of 2202
 
12.0%
united 2122
 
11.6%
india 1879
 
10.2%
states 1838
 
10.0%
america 1838
 
10.0%
other 529
 
2.9%
brazil 456
 
2.5%
japan 402
 
2.2%
russia 362
 
2.0%
ireland 320
 
1.7%
Other values (63) 6391
34.8%
2024-11-05T10:03:19.349441image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 13319
 
12.0%
i 10131
 
9.1%
e 9929
 
8.9%
n 8713
 
7.8%
t 8232
 
7.4%
8113
 
7.3%
r 6512
 
5.8%
d 5732
 
5.1%
o 4093
 
3.7%
s 3341
 
3.0%
Other values (39) 33303
29.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 111418
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 13319
 
12.0%
i 10131
 
9.1%
e 9929
 
8.9%
n 8713
 
7.8%
t 8232
 
7.4%
8113
 
7.3%
r 6512
 
5.8%
d 5732
 
5.1%
o 4093
 
3.7%
s 3341
 
3.0%
Other values (39) 33303
29.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 111418
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 13319
 
12.0%
i 10131
 
9.1%
e 9929
 
8.9%
n 8713
 
7.8%
t 8232
 
7.4%
8113
 
7.3%
r 6512
 
5.8%
d 5732
 
5.1%
o 4093
 
3.7%
s 3341
 
3.0%
Other values (39) 33303
29.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 111418
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 13319
 
12.0%
i 10131
 
9.1%
e 9929
 
8.9%
n 8713
 
7.8%
t 8232
 
7.4%
8113
 
7.3%
r 6512
 
5.8%
d 5732
 
5.1%
o 4093
 
3.7%
s 3341
 
3.0%
Other values (39) 33303
29.9%
Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
Master’s degree
4882 
Bachelor’s degree
2680 
Doctoral degree
1797 
Professional degree
 
353
Some college/university study without earning a bachelor’s degree
 
308
Other values (2)
 
206

Length

Max length65
Median length15
Mean length17.436534
Min length15

Characters and Unicode

Total characters178306
Distinct characters31
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMaster’s degree
2nd rowMaster’s degree
3rd rowBachelor’s degree
4th rowMaster’s degree
5th rowMaster’s degree

Common Values

ValueCountFrequency (%)
Master’s degree 4882
47.7%
Bachelor’s degree 2680
26.2%
Doctoral degree 1797
 
17.6%
Professional degree 353
 
3.5%
Some college/university study without earning a bachelor’s degree 308
 
3.0%
I prefer not to answer 113
 
1.1%
No formal education past high school 93
 
0.9%

Length

2024-11-05T10:03:19.717684image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T10:03:20.265856image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
degree 10020
43.5%
master’s 4882
21.2%
bachelor’s 2988
 
13.0%
doctoral 1797
 
7.8%
professional 353
 
1.5%
some 308
 
1.3%
college/university 308
 
1.3%
study 308
 
1.3%
without 308
 
1.3%
earning 308
 
1.3%
Other values (12) 1431
 
6.2%

Most occurring characters

ValueCountFrequency (%)
e 40255
22.6%
r 21088
11.8%
s 14373
 
8.1%
12785
 
7.2%
a 11028
 
6.2%
g 10729
 
6.0%
d 10421
 
5.8%
o 8903
 
5.0%
t 8323
 
4.7%
7870
 
4.4%
Other values (21) 32531
18.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 178306
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 40255
22.6%
r 21088
11.8%
s 14373
 
8.1%
12785
 
7.2%
a 11028
 
6.2%
g 10729
 
6.0%
d 10421
 
5.8%
o 8903
 
5.0%
t 8323
 
4.7%
7870
 
4.4%
Other values (21) 32531
18.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 178306
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 40255
22.6%
r 21088
11.8%
s 14373
 
8.1%
12785
 
7.2%
a 11028
 
6.2%
g 10729
 
6.0%
d 10421
 
5.8%
o 8903
 
5.0%
t 8323
 
4.7%
7870
 
4.4%
Other values (21) 32531
18.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 178306
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 40255
22.6%
r 21088
11.8%
s 14373
 
8.1%
12785
 
7.2%
a 11028
 
6.2%
g 10729
 
6.0%
d 10421
 
5.8%
o 8903
 
5.0%
t 8323
 
4.7%
7870
 
4.4%
Other values (21) 32531
18.2%
Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
Data Scientist
3243 
Software Engineer
1842 
Data Analyst
1153 
Other
1118 
Research Scientist
1071 
Other values (5)
1799 

Length

Max length23
Median length18
Mean length14.306963
Min length5

Characters and Unicode

Total characters146303
Distinct characters30
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSoftware Engineer
2nd rowOther
3rd rowOther
4th rowData Scientist
5th rowData Scientist

Common Values

ValueCountFrequency (%)
Data Scientist 3243
31.7%
Software Engineer 1842
18.0%
Data Analyst 1153
 
11.3%
Other 1118
 
10.9%
Research Scientist 1071
 
10.5%
Product/Project Manager 530
 
5.2%
Business Analyst 509
 
5.0%
Data Engineer 448
 
4.4%
Statistician 203
 
2.0%
DBA/Database Engineer 109
 
1.1%

Length

2024-11-05T10:03:20.785407image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T10:03:21.623858image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
data 4844
25.3%
scientist 4314
22.5%
engineer 2399
12.5%
software 1842
 
9.6%
analyst 1662
 
8.7%
other 1118
 
5.8%
research 1071
 
5.6%
product/project 530
 
2.8%
manager 530
 
2.8%
business 509
 
2.7%
Other values (2) 312
 
1.6%

Most occurring characters

ValueCountFrequency (%)
t 19872
13.6%
a 16056
11.0%
e 15892
10.9%
i 12145
 
8.3%
n 12016
 
8.2%
8905
 
6.1%
s 8886
 
6.1%
r 8020
 
5.5%
c 6648
 
4.5%
S 6359
 
4.3%
Other values (20) 31504
21.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 146303
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 19872
13.6%
a 16056
11.0%
e 15892
10.9%
i 12145
 
8.3%
n 12016
 
8.2%
8905
 
6.1%
s 8886
 
6.1%
r 8020
 
5.5%
c 6648
 
4.5%
S 6359
 
4.3%
Other values (20) 31504
21.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 146303
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 19872
13.6%
a 16056
11.0%
e 15892
10.9%
i 12145
 
8.3%
n 12016
 
8.2%
8905
 
6.1%
s 8886
 
6.1%
r 8020
 
5.5%
c 6648
 
4.5%
S 6359
 
4.3%
Other values (20) 31504
21.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 146303
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 19872
13.6%
a 16056
11.0%
e 15892
10.9%
i 12145
 
8.3%
n 12016
 
8.2%
8905
 
6.1%
s 8886
 
6.1%
r 8020
 
5.5%
c 6648
 
4.5%
S 6359
 
4.3%
Other values (20) 31504
21.5%
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
0-49 employees
2849 
> 10,000 employees
2327 
1000-9,999 employees
2009 
50-249 employees
1687 
250-999 employees
1354 

Length

Max length20
Median length18
Mean length16.816155
Min length14

Characters and Unicode

Total characters171962
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1000-9,999 employees
2nd row> 10,000 employees
3rd row0-49 employees
4th row0-49 employees
5th row50-249 employees

Common Values

ValueCountFrequency (%)
0-49 employees 2849
27.9%
> 10,000 employees 2327
22.8%
1000-9,999 employees 2009
19.6%
50-249 employees 1687
16.5%
250-999 employees 1354
13.2%

Length

2024-11-05T10:03:22.062445image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T10:03:22.494105image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
employees 10226
44.9%
0-49 2849
 
12.5%
2327
 
10.2%
10,000 2327
 
10.2%
1000-9,999 2009
 
8.8%
50-249 1687
 
7.4%
250-999 1354
 
5.9%

Most occurring characters

ValueCountFrequency (%)
e 30678
17.8%
0 21225
12.3%
9 16634
9.7%
12553
7.3%
o 10226
 
5.9%
s 10226
 
5.9%
y 10226
 
5.9%
l 10226
 
5.9%
p 10226
 
5.9%
m 10226
 
5.9%
Other values (7) 29516
17.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 171962
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 30678
17.8%
0 21225
12.3%
9 16634
9.7%
12553
7.3%
o 10226
 
5.9%
s 10226
 
5.9%
y 10226
 
5.9%
l 10226
 
5.9%
p 10226
 
5.9%
m 10226
 
5.9%
Other values (7) 29516
17.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 171962
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 30678
17.8%
0 21225
12.3%
9 16634
9.7%
12553
7.3%
o 10226
 
5.9%
s 10226
 
5.9%
y 10226
 
5.9%
l 10226
 
5.9%
p 10226
 
5.9%
m 10226
 
5.9%
Other values (7) 29516
17.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 171962
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 30678
17.8%
0 21225
12.3%
9 16634
9.7%
12553
7.3%
o 10226
 
5.9%
s 10226
 
5.9%
y 10226
 
5.9%
l 10226
 
5.9%
p 10226
 
5.9%
m 10226
 
5.9%
Other values (7) 29516
17.2%
Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
20+
2415 
1-2
2306 
3-4
1792 
5-9
1421 
0
1232 
Other values (2)
1060 

Length

Max length5
Median length3
Mean length2.9663603
Min length1

Characters and Unicode

Total characters30334
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row20+
3rd row0
4th row3-4
5th row20+

Common Values

ValueCountFrequency (%)
20+ 2415
23.6%
1-2 2306
22.6%
3-4 1792
17.5%
5-9 1421
13.9%
0 1232
12.0%
10-14 738
 
7.2%
15-19 322
 
3.1%

Length

2024-11-05T10:03:22.767473image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T10:03:23.115016image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
20 2415
23.6%
1-2 2306
22.6%
3-4 1792
17.5%
5-9 1421
13.9%
0 1232
12.0%
10-14 738
 
7.2%
15-19 322
 
3.1%

Most occurring characters

ValueCountFrequency (%)
- 6579
21.7%
2 4721
15.6%
1 4426
14.6%
0 4385
14.5%
4 2530
 
8.3%
+ 2415
 
8.0%
3 1792
 
5.9%
5 1743
 
5.7%
9 1743
 
5.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 30334
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
- 6579
21.7%
2 4721
15.6%
1 4426
14.6%
0 4385
14.5%
4 2530
 
8.3%
+ 2415
 
8.0%
3 1792
 
5.9%
5 1743
 
5.7%
9 1743
 
5.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 30334
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
- 6579
21.7%
2 4721
15.6%
1 4426
14.6%
0 4385
14.5%
4 2530
 
8.3%
+ 2415
 
8.0%
3 1792
 
5.9%
5 1743
 
5.7%
9 1743
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 30334
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
- 6579
21.7%
2 4721
15.6%
1 4426
14.6%
0 4385
14.5%
4 2530
 
8.3%
+ 2415
 
8.0%
3 1792
 
5.9%
5 1743
 
5.7%
9 1743
 
5.7%
Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
We recently started using ML methods (i.e., models in production for less than 2 years)
2236 
We are exploring ML methods (and may one day put a model into production)
2195 
We have well established ML methods (i.e., models in production for more than 2 years)
2077 
No (we do not use ML methods)
1737 
We use ML methods for generating insights (but do not put working models into production)
1246 

Length

Max length89
Median length86
Mean length68.864757
Min length13

Characters and Unicode

Total characters704211
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowI do not know
2nd rowI do not know
3rd rowNo (we do not use ML methods)
4th rowWe have well established ML methods (i.e., models in production for more than 2 years)
5th rowWe are exploring ML methods (and may one day put a model into production)

Common Values

ValueCountFrequency (%)
We recently started using ML methods (i.e., models in production for less than 2 years) 2236
21.9%
We are exploring ML methods (and may one day put a model into production) 2195
21.5%
We have well established ML methods (i.e., models in production for more than 2 years) 2077
20.3%
No (we do not use ML methods) 1737
17.0%
We use ML methods for generating insights (but do not put working models into production) 1246
12.2%
I do not know 735
 
7.2%

Length

2024-11-05T10:03:23.511120image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T10:03:23.969634image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
we 9491
 
7.3%
ml 9491
 
7.3%
methods 9491
 
7.3%
production 7754
 
6.0%
models 5559
 
4.3%
for 5559
 
4.3%
years 4313
 
3.3%
i.e 4313
 
3.3%
in 4313
 
3.3%
than 4313
 
3.3%
Other values (29) 64617
50.0%

Most occurring characters

ValueCountFrequency (%)
118988
16.9%
e 66751
 
9.5%
o 59374
 
8.4%
t 44681
 
6.3%
n 40315
 
5.7%
s 37936
 
5.4%
d 37420
 
5.3%
i 31313
 
4.4%
r 31057
 
4.4%
a 27237
 
3.9%
Other values (24) 209139
29.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 704211
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
118988
16.9%
e 66751
 
9.5%
o 59374
 
8.4%
t 44681
 
6.3%
n 40315
 
5.7%
s 37936
 
5.4%
d 37420
 
5.3%
i 31313
 
4.4%
r 31057
 
4.4%
a 27237
 
3.9%
Other values (24) 209139
29.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 704211
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
118988
16.9%
e 66751
 
9.5%
o 59374
 
8.4%
t 44681
 
6.3%
n 40315
 
5.7%
s 37936
 
5.4%
d 37420
 
5.3%
i 31313
 
4.4%
r 31057
 
4.4%
a 27237
 
3.9%
Other values (24) 209139
29.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 704211
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
118988
16.9%
e 66751
 
9.5%
o 59374
 
8.4%
t 44681
 
6.3%
n 40315
 
5.7%
s 37936
 
5.4%
d 37420
 
5.3%
i 31313
 
4.4%
r 31057
 
4.4%
a 27237
 
3.9%
Other values (24) 209139
29.7%
Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
$0-999
1064 
10,000-14,999
685 
100,000-124,999
 
649
30,000-39,999
 
633
40,000-49,999
 
622
Other values (20)
6573 

Length

Max length15
Median length13
Mean length12.214942
Min length6

Characters and Unicode

Total characters124910
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row30,000-39,999
2nd row250,000-299,999
3rd row4,000-4,999
4th row60,000-69,999
5th row10,000-14,999

Common Values

ValueCountFrequency (%)
$0-999 1064
 
10.4%
10,000-14,999 685
 
6.7%
100,000-124,999 649
 
6.3%
30,000-39,999 633
 
6.2%
40,000-49,999 622
 
6.1%
50,000-59,999 603
 
5.9%
60,000-69,999 501
 
4.9%
70,000-79,999 458
 
4.5%
15,000-19,999 452
 
4.4%
20,000-24,999 438
 
4.3%
Other values (15) 4121
40.3%

Length

2024-11-05T10:03:24.575726image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0-999 1064
 
10.4%
10,000-14,999 685
 
6.7%
100,000-124,999 649
 
6.3%
30,000-39,999 633
 
6.2%
40,000-49,999 622
 
6.1%
50,000-59,999 603
 
5.9%
60,000-69,999 501
 
4.9%
70,000-79,999 458
 
4.5%
15,000-19,999 452
 
4.4%
20,000-24,999 438
 
4.3%
Other values (16) 4169
40.6%

Most occurring characters

ValueCountFrequency (%)
9 36684
29.4%
0 35355
28.3%
, 18276
14.6%
- 10178
 
8.1%
1 6036
 
4.8%
4 4480
 
3.6%
5 3783
 
3.0%
2 3758
 
3.0%
3 1796
 
1.4%
7 1656
 
1.3%
Other values (5) 2908
 
2.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 124910
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
9 36684
29.4%
0 35355
28.3%
, 18276
14.6%
- 10178
 
8.1%
1 6036
 
4.8%
4 4480
 
3.6%
5 3783
 
3.0%
2 3758
 
3.0%
3 1796
 
1.4%
7 1656
 
1.3%
Other values (5) 2908
 
2.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 124910
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
9 36684
29.4%
0 35355
28.3%
, 18276
14.6%
- 10178
 
8.1%
1 6036
 
4.8%
4 4480
 
3.6%
5 3783
 
3.0%
2 3758
 
3.0%
3 1796
 
1.4%
7 1656
 
1.3%
Other values (5) 2908
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 124910
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
9 36684
29.4%
0 35355
28.3%
, 18276
14.6%
- 10178
 
8.1%
1 6036
 
4.8%
4 4480
 
3.6%
5 3783
 
3.0%
2 3758
 
3.0%
3 1796
 
1.4%
7 1656
 
1.3%
Other values (5) 2908
 
2.3%
Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
$0 (USD)
3160 
$100-$999
1995 
$1000-$9,999
1859 
$1-$99
1215 
$10,000-$99,999
1128 

Length

Max length17
Median length15
Mean length10.221592
Min length6

Characters and Unicode

Total characters104526
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row$0 (USD)
2nd row$10,000-$99,999
3rd row$0 (USD)
4th row$10,000-$99,999
5th row$100-$999

Common Values

ValueCountFrequency (%)
$0 (USD) 3160
30.9%
$100-$999 1995
19.5%
$1000-$9,999 1859
18.2%
$1-$99 1215
 
11.9%
$10,000-$99,999 1128
 
11.0%
> $100,000 ($USD) 869
 
8.5%

Length

2024-11-05T10:03:24.915447image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T10:03:25.360217image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
usd 4029
26.6%
0 3160
20.9%
100-$999 1995
13.2%
1000-$9,999 1859
12.3%
1-$99 1215
 
8.0%
10,000-$99,999 1128
 
7.5%
869
 
5.7%
100,000 869
 
5.7%

Most occurring characters

ValueCountFrequency (%)
0 21584
20.6%
9 21491
20.6%
$ 17292
16.5%
1 7066
 
6.8%
- 6197
 
5.9%
, 4984
 
4.8%
4898
 
4.7%
( 4029
 
3.9%
U 4029
 
3.9%
S 4029
 
3.9%
Other values (3) 8927
8.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 104526
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 21584
20.6%
9 21491
20.6%
$ 17292
16.5%
1 7066
 
6.8%
- 6197
 
5.9%
, 4984
 
4.8%
4898
 
4.7%
( 4029
 
3.9%
U 4029
 
3.9%
S 4029
 
3.9%
Other values (3) 8927
8.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 104526
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 21584
20.6%
9 21491
20.6%
$ 17292
16.5%
1 7066
 
6.8%
- 6197
 
5.9%
, 4984
 
4.8%
4898
 
4.7%
( 4029
 
3.9%
U 4029
 
3.9%
S 4029
 
3.9%
Other values (3) 8927
8.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 104526
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 21584
20.6%
9 21491
20.6%
$ 17292
16.5%
1 7066
 
6.8%
- 6197
 
5.9%
, 4984
 
4.8%
4898
 
4.7%
( 4029
 
3.9%
U 4029
 
3.9%
S 4029
 
3.9%
Other values (3) 8927
8.5%
Distinct3469
Distinct (%)33.9%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
2024-11-05T10:03:26.441112image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length89
Median length86
Mean length75.375318
Min length25

Characters and Unicode

Total characters770788
Distinct characters55
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3030 ?
Unique (%)29.6%

Sample

1st rowBasic statistical software (Microsoft Excel, Google Sheets, etc.), 0, -1, -1, -1, -1
2nd rowLocal development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 0, -1
3rd rowLocal development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 1, -1
4th rowAdvanced statistical software (SPSS, SAS, etc.), -1, 0, -1, -1, -1
5th rowLocal development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 2, -1
ValueCountFrequency (%)
1 42430
36.7%
etc 9495
 
8.2%
local 5586
 
4.8%
development 5586
 
4.8%
environments 5586
 
4.8%
rstudio 5586
 
4.8%
jupyterlab 5586
 
4.8%
software 3909
 
3.4%
statistical 2287
 
2.0%
basic 1652
 
1.4%
Other values (2157) 27938
24.2%
2024-11-05T10:03:27.950838image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
105415
 
13.7%
, 71742
 
9.3%
e 62218
 
8.1%
t 48946
 
6.4%
1 45968
 
6.0%
- 42580
 
5.5%
o 35163
 
4.6%
a 26808
 
3.5%
n 25019
 
3.2%
c 24319
 
3.2%
Other values (45) 282610
36.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 770788
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
105415
 
13.7%
, 71742
 
9.3%
e 62218
 
8.1%
t 48946
 
6.4%
1 45968
 
6.0%
- 42580
 
5.5%
o 35163
 
4.6%
a 26808
 
3.5%
n 25019
 
3.2%
c 24319
 
3.2%
Other values (45) 282610
36.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 770788
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
105415
 
13.7%
, 71742
 
9.3%
e 62218
 
8.1%
t 48946
 
6.4%
1 45968
 
6.0%
- 42580
 
5.5%
o 35163
 
4.6%
a 26808
 
3.5%
n 25019
 
3.2%
c 24319
 
3.2%
Other values (45) 282610
36.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 770788
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
105415
 
13.7%
, 71742
 
9.3%
e 62218
 
8.1%
t 48946
 
6.4%
1 45968
 
6.0%
- 42580
 
5.5%
o 35163
 
4.6%
a 26808
 
3.5%
n 25019
 
3.2%
c 24319
 
3.2%
Other values (45) 282610
36.7%
Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
3-5 years
2672 
1-2 years
2542 
< 1 years
1892 
5-10 years
1662 
10-20 years
955 

Length

Max length11
Median length9
Mean length9.3493057
Min length9

Characters and Unicode

Total characters95606
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1-2 years
2nd row1-2 years
3rd row< 1 years
4th row20+ years
5th row3-5 years

Common Values

ValueCountFrequency (%)
3-5 years 2672
26.1%
1-2 years 2542
24.9%
< 1 years 1892
18.5%
5-10 years 1662
16.3%
10-20 years 955
 
9.3%
20+ years 503
 
4.9%

Length

2024-11-05T10:03:28.390212image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T10:03:28.787380image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
years 10226
45.8%
3-5 2672
 
12.0%
1-2 2542
 
11.4%
1892
 
8.5%
1 1892
 
8.5%
5-10 1662
 
7.4%
10-20 955
 
4.3%
20 503
 
2.3%

Most occurring characters

ValueCountFrequency (%)
12118
12.7%
y 10226
10.7%
e 10226
10.7%
a 10226
10.7%
r 10226
10.7%
s 10226
10.7%
- 7831
8.2%
1 7051
7.4%
5 4334
 
4.5%
0 4075
 
4.3%
Other values (4) 9067
9.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 95606
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
12118
12.7%
y 10226
10.7%
e 10226
10.7%
a 10226
10.7%
r 10226
10.7%
s 10226
10.7%
- 7831
8.2%
1 7051
7.4%
5 4334
 
4.5%
0 4075
 
4.3%
Other values (4) 9067
9.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 95606
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
12118
12.7%
y 10226
10.7%
e 10226
10.7%
a 10226
10.7%
r 10226
10.7%
s 10226
10.7%
- 7831
8.2%
1 7051
7.4%
5 4334
 
4.5%
0 4075
 
4.3%
Other values (4) 9067
9.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 95606
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
12118
12.7%
y 10226
10.7%
e 10226
10.7%
a 10226
10.7%
r 10226
10.7%
s 10226
10.7%
- 7831
8.2%
1 7051
7.4%
5 4334
 
4.5%
0 4075
 
4.3%
Other values (4) 9067
9.5%
Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
Python
7879 
R
1036 
SQL
 
711
Other
 
600

Length

Max length6
Median length6
Mean length5.2261881
Min length1

Characters and Unicode

Total characters53443
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPython
2nd rowPython
3rd rowPython
4th rowOther
5th rowPython

Common Values

ValueCountFrequency (%)
Python 7879
77.0%
R 1036
 
10.1%
SQL 711
 
7.0%
Other 600
 
5.9%

Length

2024-11-05T10:03:29.235419image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T10:03:29.570095image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
python 7879
77.0%
r 1036
 
10.1%
sql 711
 
7.0%
other 600
 
5.9%

Most occurring characters

ValueCountFrequency (%)
t 8479
15.9%
h 8479
15.9%
P 7879
14.7%
y 7879
14.7%
o 7879
14.7%
n 7879
14.7%
R 1036
 
1.9%
S 711
 
1.3%
Q 711
 
1.3%
L 711
 
1.3%
Other values (3) 1800
 
3.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 53443
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 8479
15.9%
h 8479
15.9%
P 7879
14.7%
y 7879
14.7%
o 7879
14.7%
n 7879
14.7%
R 1036
 
1.9%
S 711
 
1.3%
Q 711
 
1.3%
L 711
 
1.3%
Other values (3) 1800
 
3.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 53443
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 8479
15.9%
h 8479
15.9%
P 7879
14.7%
y 7879
14.7%
o 7879
14.7%
n 7879
14.7%
R 1036
 
1.9%
S 711
 
1.3%
Q 711
 
1.3%
L 711
 
1.3%
Other values (3) 1800
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 53443
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 8479
15.9%
h 8479
15.9%
P 7879
14.7%
y 7879
14.7%
o 7879
14.7%
n 7879
14.7%
R 1036
 
1.9%
S 711
 
1.3%
Q 711
 
1.3%
L 711
 
1.3%
Other values (3) 1800
 
3.4%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
Never
8251 
Used TPU
1975 

Length

Max length8
Median length5
Mean length5.5794054
Min length5

Characters and Unicode

Total characters57055
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNever
2nd rowUsed TPU
3rd rowNever
4th rowNever
5th rowUsed TPU

Common Values

ValueCountFrequency (%)
Never 8251
80.7%
Used TPU 1975
 
19.3%

Length

2024-11-05T10:03:29.838356image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T10:03:30.258239image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
never 8251
67.6%
used 1975
 
16.2%
tpu 1975
 
16.2%

Most occurring characters

ValueCountFrequency (%)
e 18477
32.4%
N 8251
14.5%
v 8251
14.5%
r 8251
14.5%
U 3950
 
6.9%
s 1975
 
3.5%
d 1975
 
3.5%
1975
 
3.5%
T 1975
 
3.5%
P 1975
 
3.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 57055
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 18477
32.4%
N 8251
14.5%
v 8251
14.5%
r 8251
14.5%
U 3950
 
6.9%
s 1975
 
3.5%
d 1975
 
3.5%
1975
 
3.5%
T 1975
 
3.5%
P 1975
 
3.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 57055
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 18477
32.4%
N 8251
14.5%
v 8251
14.5%
r 8251
14.5%
U 3950
 
6.9%
s 1975
 
3.5%
d 1975
 
3.5%
1975
 
3.5%
T 1975
 
3.5%
P 1975
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 57055
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 18477
32.4%
N 8251
14.5%
v 8251
14.5%
r 8251
14.5%
U 3950
 
6.9%
s 1975
 
3.5%
d 1975
 
3.5%
1975
 
3.5%
T 1975
 
3.5%
P 1975
 
3.5%
Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
< 1 years
2965 
1-2 years
2641 
2-3 years
1526 
3-4 years
946 
4-5 years
850 
Other values (3)
1298 

Length

Max length11
Median length9
Mean length9.1414043
Min length9

Characters and Unicode

Total characters93480
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1-2 years
2nd row2-3 years
3rd row< 1 years
4th row10-15 years
5th row2-3 years

Common Values

ValueCountFrequency (%)
< 1 years 2965
29.0%
1-2 years 2641
25.8%
2-3 years 1526
14.9%
3-4 years 946
 
9.3%
4-5 years 850
 
8.3%
5-10 years 808
 
7.9%
10-15 years 319
 
3.1%
20+ years 171
 
1.7%

Length

2024-11-05T10:03:30.622112image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-05T10:03:31.073622image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
ValueCountFrequency (%)
years 10226
43.7%
2965
 
12.7%
1 2965
 
12.7%
1-2 2641
 
11.3%
2-3 1526
 
6.5%
3-4 946
 
4.0%
4-5 850
 
3.6%
5-10 808
 
3.5%
10-15 319
 
1.4%
20 171
 
0.7%

Most occurring characters

ValueCountFrequency (%)
13191
14.1%
y 10226
10.9%
e 10226
10.9%
a 10226
10.9%
r 10226
10.9%
s 10226
10.9%
- 7090
7.6%
1 7052
7.5%
2 4338
 
4.6%
< 2965
 
3.2%
Other values (5) 7714
8.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 93480
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
13191
14.1%
y 10226
10.9%
e 10226
10.9%
a 10226
10.9%
r 10226
10.9%
s 10226
10.9%
- 7090
7.6%
1 7052
7.5%
2 4338
 
4.6%
< 2965
 
3.2%
Other values (5) 7714
8.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 93480
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
13191
14.1%
y 10226
10.9%
e 10226
10.9%
a 10226
10.9%
r 10226
10.9%
s 10226
10.9%
- 7090
7.6%
1 7052
7.5%
2 4338
 
4.6%
< 2965
 
3.2%
Other values (5) 7714
8.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 93480
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
13191
14.1%
y 10226
10.9%
e 10226
10.9%
a 10226
10.9%
r 10226
10.9%
s 10226
10.9%
- 7090
7.6%
1 7052
7.5%
2 4338
 
4.6%
< 2965
 
3.2%
Other values (5) 7714
8.3%
Distinct902
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
2024-11-05T10:03:31.734599image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length505
Median length395
Mean length163.61999
Min length4

Characters and Unicode

Total characters1673178
Distinct characters49
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique304 ?
Unique (%)3.0%

Sample

1st rowTwitter (data science influencers), Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc)
2nd rowPodcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc), Slack Communities (ods.ai, kagglenoobs, etc)
3rd rowYouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Other
4th rowYouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc)
5th rowKaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Journal Publications (traditional publications, preprint journals, etc)
ValueCountFrequency (%)
etc 28445
 
13.9%
data 10480
 
5.1%
science 10480
 
5.1%
forums 9221
 
4.5%
kaggle 6783
 
3.3%
blog 6783
 
3.3%
social 6783
 
3.3%
media 6783
 
3.3%
kdnuggets 6549
 
3.2%
vidhya 6549
 
3.2%
Other values (36) 105466
51.6%
2024-11-05T10:03:32.856214image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
194096
 
11.6%
e 124951
 
7.5%
a 117646
 
7.0%
i 99029
 
5.9%
t 90897
 
5.4%
, 90658
 
5.4%
s 86028
 
5.1%
c 84481
 
5.0%
o 78251
 
4.7%
n 68805
 
4.1%
Other values (39) 638336
38.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1673178
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
194096
 
11.6%
e 124951
 
7.5%
a 117646
 
7.0%
i 99029
 
5.9%
t 90897
 
5.4%
, 90658
 
5.4%
s 86028
 
5.1%
c 84481
 
5.0%
o 78251
 
4.7%
n 68805
 
4.1%
Other values (39) 638336
38.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1673178
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
194096
 
11.6%
e 124951
 
7.5%
a 117646
 
7.0%
i 99029
 
5.9%
t 90897
 
5.4%
, 90658
 
5.4%
s 86028
 
5.1%
c 84481
 
5.0%
o 78251
 
4.7%
n 68805
 
4.1%
Other values (39) 638336
38.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1673178
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
194096
 
11.6%
e 124951
 
7.5%
a 117646
 
7.0%
i 99029
 
5.9%
t 90897
 
5.4%
, 90658
 
5.4%
s 86028
 
5.1%
c 84481
 
5.0%
o 78251
 
4.7%
n 68805
 
4.1%
Other values (39) 638336
38.2%
Distinct713
Distinct (%)7.0%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
2024-11-05T10:03:33.684322image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length169
Median length148
Mean length41.716018
Min length3

Characters and Unicode

Total characters426588
Distinct characters35
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique244 ?
Unique (%)2.4%

Sample

1st rowCoursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), Udemy
2nd rowCoursera, edX, DataCamp, University Courses (resulting in a university degree)
3rd rowOther
4th rowNone
5th rowUdacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), Udemy
ValueCountFrequency (%)
kaggle 6470
11.5%
courses 5999
10.6%
coursera 5809
 
10.3%
university 5528
 
9.8%
learn 3235
 
5.7%
i.e 3235
 
5.7%
udemy 3115
 
5.5%
a 2764
 
4.9%
degree 2764
 
4.9%
resulting 2764
 
4.9%
Other values (10) 14684
26.1%
2024-11-05T10:03:35.313062image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 50570
 
11.9%
46141
 
10.8%
r 33819
 
7.9%
a 32261
 
7.6%
s 27655
 
6.5%
i 24625
 
5.8%
g 19304
 
4.5%
n 18367
 
4.3%
u 17804
 
4.2%
t 16100
 
3.8%
Other values (25) 139942
32.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 426588
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 50570
 
11.9%
46141
 
10.8%
r 33819
 
7.9%
a 32261
 
7.6%
s 27655
 
6.5%
i 24625
 
5.8%
g 19304
 
4.5%
n 18367
 
4.3%
u 17804
 
4.2%
t 16100
 
3.8%
Other values (25) 139942
32.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 426588
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 50570
 
11.9%
46141
 
10.8%
r 33819
 
7.9%
a 32261
 
7.6%
s 27655
 
6.5%
i 24625
 
5.8%
g 19304
 
4.5%
n 18367
 
4.3%
u 17804
 
4.2%
t 16100
 
3.8%
Other values (25) 139942
32.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 426588
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 50570
 
11.9%
46141
 
10.8%
r 33819
 
7.9%
a 32261
 
7.6%
s 27655
 
6.5%
i 24625
 
5.8%
g 19304
 
4.5%
n 18367
 
4.3%
u 17804
 
4.2%
t 16100
 
3.8%
Other values (25) 139942
32.8%
Distinct752
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
2024-11-05T10:03:35.929267image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length185
Median length158
Mean length65.056523
Min length4

Characters and Unicode

Total characters665268
Distinct characters39
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique260 ?
Unique (%)2.5%

Sample

1st rowJupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , MATLAB , Spyder
2nd rowJupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio Code
3rd rowJupyter (JupyterLab, Jupyter Notebooks, etc)
4th row RStudio , Other
5th rowJupyter (JupyterLab, Jupyter Notebooks, etc) , Spyder , Notepad++ , Sublime Text
ValueCountFrequency (%)
21925
23.0%
jupyter 14970
15.7%
notebooks 7485
 
7.8%
etc 7485
 
7.8%
jupyterlab 7485
 
7.8%
visual 6468
 
6.8%
studio 6468
 
6.8%
rstudio 3334
 
3.5%
code 3234
 
3.4%
pycharm 2999
 
3.1%
Other values (10) 13678
14.3%
2024-11-05T10:03:37.229359image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
129719
19.5%
t 53054
 
8.0%
e 49514
 
7.4%
u 40538
 
6.1%
o 39147
 
5.9%
, 32269
 
4.9%
r 28025
 
4.2%
y 27506
 
4.1%
p 27004
 
4.1%
J 22455
 
3.4%
Other values (29) 216037
32.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 665268
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
129719
19.5%
t 53054
 
8.0%
e 49514
 
7.4%
u 40538
 
6.1%
o 39147
 
5.9%
, 32269
 
4.9%
r 28025
 
4.2%
y 27506
 
4.1%
p 27004
 
4.1%
J 22455
 
3.4%
Other values (29) 216037
32.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 665268
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
129719
19.5%
t 53054
 
8.0%
e 49514
 
7.4%
u 40538
 
6.1%
o 39147
 
5.9%
, 32269
 
4.9%
r 28025
 
4.2%
y 27506
 
4.1%
p 27004
 
4.1%
J 22455
 
3.4%
Other values (29) 216037
32.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 665268
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
129719
19.5%
t 53054
 
8.0%
e 49514
 
7.4%
u 40538
 
6.1%
o 39147
 
5.9%
, 32269
 
4.9%
r 28025
 
4.2%
y 27506
 
4.1%
p 27004
 
4.1%
J 22455
 
3.4%
Other values (29) 216037
32.5%
Distinct228
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
2024-11-05T10:03:37.744137image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length295
Median length254
Mean length29.36945
Min length4

Characters and Unicode

Total characters300332
Distinct characters44
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique96 ?
Unique (%)0.9%

Sample

1st rowNone
2nd row Microsoft Azure Notebooks
3rd row Google Colab , Google Cloud Notebook Products (AI Platform, Datalab, etc)
4th rowNone
5th row Kaggle Notebooks (Kernels) , Google Colab , Binder / JupyterHub
ValueCountFrequency (%)
5411
12.7%
notebooks 5161
12.1%
none 3870
 
9.1%
google 3770
 
8.8%
kernels 3225
 
7.5%
kaggle 3225
 
7.5%
colab 2981
 
7.0%
notebook 1427
 
3.3%
products 1427
 
3.3%
etc 1427
 
3.3%
Other values (20) 10816
25.3%
2024-11-05T10:03:39.309081image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
47958
16.0%
o 39503
13.2%
e 30393
 
10.1%
l 15639
 
5.2%
t 14204
 
4.7%
b 11584
 
3.9%
a 11490
 
3.8%
s 11037
 
3.7%
g 10858
 
3.6%
N 10458
 
3.5%
Other values (34) 97208
32.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 300332
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
47958
16.0%
o 39503
13.2%
e 30393
 
10.1%
l 15639
 
5.2%
t 14204
 
4.7%
b 11584
 
3.9%
a 11490
 
3.8%
s 11037
 
3.7%
g 10858
 
3.6%
N 10458
 
3.5%
Other values (34) 97208
32.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 300332
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
47958
16.0%
o 39503
13.2%
e 30393
 
10.1%
l 15639
 
5.2%
t 14204
 
4.7%
b 11584
 
3.9%
a 11490
 
3.8%
s 11037
 
3.7%
g 10858
 
3.6%
N 10458
 
3.5%
Other values (34) 97208
32.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 300332
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
47958
16.0%
o 39503
13.2%
e 30393
 
10.1%
l 15639
 
5.2%
t 14204
 
4.7%
b 11584
 
3.9%
a 11490
 
3.8%
s 11037
 
3.7%
g 10858
 
3.6%
N 10458
 
3.5%
Other values (34) 97208
32.4%
Distinct542
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
2024-11-05T10:03:39.726830image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length70
Median length58
Mean length15.099844
Min length1

Characters and Unicode

Total characters154411
Distinct characters29
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique181 ?
Unique (%)1.8%

Sample

1st rowPython, R, SQL, Java, Javascript, MATLAB
2nd rowPython, R, SQL, Bash
3rd rowPython, SQL
4th rowPython, R
5th rowPython, R, Bash
ValueCountFrequency (%)
python 9015
33.4%
sql 5218
19.3%
r 3514
 
13.0%
c 2185
 
8.1%
bash 1685
 
6.2%
javascript 1617
 
6.0%
java 1501
 
5.6%
other 966
 
3.6%
matlab 921
 
3.4%
typescript 331
 
1.2%
2024-11-05T10:03:40.769107image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
, 16787
 
10.9%
16787
 
10.9%
t 11929
 
7.7%
h 11666
 
7.6%
y 9346
 
6.1%
o 9075
 
5.9%
n 9075
 
5.9%
P 9015
 
5.8%
a 7921
 
5.1%
L 6139
 
4.0%
Other values (19) 46671
30.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 154411
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
, 16787
 
10.9%
16787
 
10.9%
t 11929
 
7.7%
h 11666
 
7.6%
y 9346
 
6.1%
o 9075
 
5.9%
n 9075
 
5.9%
P 9015
 
5.8%
a 7921
 
5.1%
L 6139
 
4.0%
Other values (19) 46671
30.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 154411
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
, 16787
 
10.9%
16787
 
10.9%
t 11929
 
7.7%
h 11666
 
7.6%
y 9346
 
6.1%
o 9075
 
5.9%
n 9075
 
5.9%
P 9015
 
5.8%
a 7921
 
5.1%
L 6139
 
4.0%
Other values (19) 46671
30.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 154411
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
, 16787
 
10.9%
16787
 
10.9%
t 11929
 
7.7%
h 11666
 
7.6%
y 9346
 
6.1%
o 9075
 
5.9%
n 9075
 
5.9%
P 9015
 
5.8%
a 7921
 
5.1%
L 6139
 
4.0%
Other values (19) 46671
30.2%
Distinct412
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
2024-11-05T10:03:41.199117image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length141
Median length123
Mean length30.946509
Min length4

Characters and Unicode

Total characters316459
Distinct characters38
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique158 ?
Unique (%)1.5%

Sample

1st row Matplotlib
2nd row Ggplot / ggplot2 , Matplotlib , Seaborn
3rd row Matplotlib , Plotly / Plotly Express , Seaborn
4th row Ggplot / ggplot2
5th row Matplotlib , Plotly / Plotly Express , Bokeh , Seaborn
ValueCountFrequency (%)
18890
37.4%
matplotlib 7287
 
14.4%
plotly 4994
 
9.9%
seaborn 4862
 
9.6%
ggplot 3177
 
6.3%
ggplot2 3177
 
6.3%
express 2497
 
4.9%
shiny 1079
 
2.1%
none 915
 
1.8%
d3.js 903
 
1.8%
Other values (6) 2722
 
5.4%
2024-11-05T10:03:42.011244image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
70915
22.4%
l 32892
 
10.4%
t 27347
 
8.6%
o 26637
 
8.4%
p 16602
 
5.2%
, 12758
 
4.0%
a 12739
 
4.0%
b 12613
 
4.0%
e 10864
 
3.4%
g 9531
 
3.0%
Other values (28) 83561
26.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 316459
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
70915
22.4%
l 32892
 
10.4%
t 27347
 
8.6%
o 26637
 
8.4%
p 16602
 
5.2%
, 12758
 
4.0%
a 12739
 
4.0%
b 12613
 
4.0%
e 10864
 
3.4%
g 9531
 
3.0%
Other values (28) 83561
26.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 316459
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
70915
22.4%
l 32892
 
10.4%
t 27347
 
8.6%
o 26637
 
8.4%
p 16602
 
5.2%
, 12758
 
4.0%
a 12739
 
4.0%
b 12613
 
4.0%
e 10864
 
3.4%
g 9531
 
3.0%
Other values (28) 83561
26.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 316459
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
70915
22.4%
l 32892
 
10.4%
t 27347
 
8.6%
o 26637
 
8.4%
p 16602
 
5.2%
, 12758
 
4.0%
a 12739
 
4.0%
b 12613
 
4.0%
e 10864
 
3.4%
g 9531
 
3.0%
Other values (28) 83561
26.4%
Distinct14
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
CPUs
3687 
CPUs, GPUs
3679 
None / I do not know
1683 
GPUs
751 
CPUs, GPUs, TPUs
 
248
Other values (9)
 
178

Length

Max length23
Median length20
Mean length9.1785644
Min length4

Characters and Unicode

Total characters93860
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowCPUs, GPUs
2nd rowCPUs, GPUs
3rd rowCPUs, GPUs
4th rowCPUs, GPUs
5th rowCPUs, GPUs

Common Values

ValueCountFrequency (%)
CPUs 3687
36.1%
CPUs, GPUs 3679
36.0%
None / I do not know 1683
16.5%
GPUs 751
 
7.3%
CPUs, GPUs, TPUs 248
 
2.4%
GPUs, TPUs 50
 
0.5%
Other 40
 
0.4%
CPUs, TPUs 23
 
0.2%
CPUs, GPUs, Other 21
 
0.2%
TPUs 21
 
0.2%
Other values (4) 23
 
0.2%

Length

2024-11-05T10:03:42.372034image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
cpus 7676
33.4%
gpus 4759
20.7%
none 1683
 
7.3%
1683
 
7.3%
i 1683
 
7.3%
do 1683
 
7.3%
not 1683
 
7.3%
know 1683
 
7.3%
tpus 348
 
1.5%
other 84
 
0.4%

Most occurring characters

ValueCountFrequency (%)
P 12783
13.6%
U 12783
13.6%
s 12783
13.6%
12739
13.6%
C 7676
8.2%
o 6732
7.2%
n 5049
 
5.4%
G 4759
 
5.1%
, 4324
 
4.6%
t 1767
 
1.9%
Other values (11) 12465
13.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 93860
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
P 12783
13.6%
U 12783
13.6%
s 12783
13.6%
12739
13.6%
C 7676
8.2%
o 6732
7.2%
n 5049
 
5.4%
G 4759
 
5.1%
, 4324
 
4.6%
t 1767
 
1.9%
Other values (11) 12465
13.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 93860
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
P 12783
13.6%
U 12783
13.6%
s 12783
13.6%
12739
13.6%
C 7676
8.2%
o 6732
7.2%
n 5049
 
5.4%
G 4759
 
5.1%
, 4324
 
4.6%
t 1767
 
1.9%
Other values (11) 12465
13.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 93860
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
P 12783
13.6%
U 12783
13.6%
s 12783
13.6%
12739
13.6%
C 7676
8.2%
o 6732
7.2%
n 5049
 
5.4%
G 4759
 
5.1%
, 4324
 
4.6%
t 1767
 
1.9%
Other values (11) 12465
13.3%
Distinct630
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
2024-11-05T10:03:42.794313image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length336
Median length288
Mean length103.96548
Min length4

Characters and Unicode

Total characters1063151
Distinct characters43
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique230 ?
Unique (%)2.2%

Sample

1st rowLinear or Logistic Regression
2nd rowLinear or Logistic Regression, Convolutional Neural Networks
3rd rowLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc)
4th rowLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Convolutional Neural Networks, Generative Adversarial Networks, Recurrent Neural Networks
5th rowLinear or Logistic Regression, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks, Recurrent Neural Networks
ValueCountFrequency (%)
or 13843
 
10.4%
networks 10146
 
7.7%
neural 8735
 
6.6%
linear 7454
 
5.6%
logistic 7454
 
5.6%
regression 7454
 
5.6%
etc 7434
 
5.6%
decision 6389
 
4.8%
trees 6389
 
4.8%
random 6389
 
4.8%
Other values (20) 50826
38.4%
2024-11-05T10:03:43.938073image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
122287
 
11.5%
e 103625
 
9.7%
o 93270
 
8.8%
s 83530
 
7.9%
r 78496
 
7.4%
i 68651
 
6.5%
n 58971
 
5.5%
t 57545
 
5.4%
a 47648
 
4.5%
, 35206
 
3.3%
Other values (33) 313922
29.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1063151
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
122287
 
11.5%
e 103625
 
9.7%
o 93270
 
8.8%
s 83530
 
7.9%
r 78496
 
7.4%
i 68651
 
6.5%
n 58971
 
5.5%
t 57545
 
5.4%
a 47648
 
4.5%
, 35206
 
3.3%
Other values (33) 313922
29.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1063151
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
122287
 
11.5%
e 103625
 
9.7%
o 93270
 
8.8%
s 83530
 
7.9%
r 78496
 
7.4%
i 68651
 
6.5%
n 58971
 
5.5%
t 57545
 
5.4%
a 47648
 
4.5%
, 35206
 
3.3%
Other values (33) 313922
29.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1063151
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
122287
 
11.5%
e 103625
 
9.7%
o 93270
 
8.8%
s 83530
 
7.9%
r 78496
 
7.4%
i 68651
 
6.5%
n 58971
 
5.5%
t 57545
 
5.4%
a 47648
 
4.5%
, 35206
 
3.3%
Other values (33) 313922
29.5%
Distinct92
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
2024-11-05T10:03:44.383308image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length374
Median length4
Mean length46.464209
Min length4

Characters and Unicode

Total characters475143
Distinct characters41
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)0.2%

Sample

1st rowNone
2nd rowAutomation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)
3rd rowNone
4th rowAutomated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)
5th rowAutomated data augmentation (e.g. imgaug, albumentations), Automated feature engineering/selection (e.g. tpot, boruta_py), Automated model selection (e.g. auto-sklearn, xcessiv), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)
ValueCountFrequency (%)
e.g 7591
 
13.4%
automated 6645
 
11.8%
none 5707
 
10.1%
model 2625
 
4.6%
selection 2257
 
4.0%
auto-sklearn 2257
 
4.0%
xcessiv 2257
 
4.0%
tuning 1463
 
2.6%
ray.tune 1463
 
2.6%
hyperopt 1463
 
2.6%
Other values (24) 22805
40.3%
2024-11-05T10:03:45.221061image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 56882
 
12.0%
46307
 
9.7%
t 40565
 
8.5%
o 32965
 
6.9%
a 29760
 
6.3%
n 27098
 
5.7%
u 21484
 
4.5%
i 17800
 
3.7%
r 16781
 
3.5%
. 16645
 
3.5%
Other values (31) 168856
35.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 475143
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 56882
 
12.0%
46307
 
9.7%
t 40565
 
8.5%
o 32965
 
6.9%
a 29760
 
6.3%
n 27098
 
5.7%
u 21484
 
4.5%
i 17800
 
3.7%
r 16781
 
3.5%
. 16645
 
3.5%
Other values (31) 168856
35.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 475143
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 56882
 
12.0%
46307
 
9.7%
t 40565
 
8.5%
o 32965
 
6.9%
a 29760
 
6.3%
n 27098
 
5.7%
u 21484
 
4.5%
i 17800
 
3.7%
r 16781
 
3.5%
. 16645
 
3.5%
Other values (31) 168856
35.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 475143
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 56882
 
12.0%
46307
 
9.7%
t 40565
 
8.5%
o 32965
 
6.9%
a 29760
 
6.3%
n 27098
 
5.7%
u 21484
 
4.5%
i 17800
 
3.7%
r 16781
 
3.5%
. 16645
 
3.5%
Other values (31) 168856
35.5%
Distinct554
Distinct (%)5.4%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
2024-11-05T10:03:45.624520image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Length

Max length129
Median length107
Mean length36.544299
Min length4

Characters and Unicode

Total characters373702
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique194 ?
Unique (%)1.9%

Sample

1st rowNone
2nd row Scikit-learn , TensorFlow , Keras , RandomForest
3rd row Scikit-learn , RandomForest, Xgboost , LightGBM
4th row Scikit-learn , TensorFlow , Keras , RandomForest, Xgboost , Caret
5th row Scikit-learn , TensorFlow , Keras , PyTorch
ValueCountFrequency (%)
17585
35.9%
scikit-learn 6883
 
14.1%
keras 4265
 
8.7%
tensorflow 4233
 
8.6%
randomforest 3457
 
7.1%
xgboost 3367
 
6.9%
pytorch 2517
 
5.1%
lightgbm 1734
 
3.5%
none 1302
 
2.7%
caret 984
 
2.0%
Other values (4) 2616
 
5.3%
2024-11-05T10:03:46.424274image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
86727
23.2%
o 25933
 
6.9%
r 23427
 
6.3%
e 21408
 
5.7%
, 20328
 
5.4%
a 17841
 
4.8%
t 17433
 
4.7%
i 17028
 
4.6%
s 16046
 
4.3%
n 15875
 
4.2%
Other values (27) 111656
29.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 373702
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
86727
23.2%
o 25933
 
6.9%
r 23427
 
6.3%
e 21408
 
5.7%
, 20328
 
5.4%
a 17841
 
4.8%
t 17433
 
4.7%
i 17028
 
4.6%
s 16046
 
4.3%
n 15875
 
4.2%
Other values (27) 111656
29.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 373702
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
86727
23.2%
o 25933
 
6.9%
r 23427
 
6.3%
e 21408
 
5.7%
, 20328
 
5.4%
a 17841
 
4.8%
t 17433
 
4.7%
i 17028
 
4.6%
s 16046
 
4.3%
n 15875
 
4.2%
Other values (27) 111656
29.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 373702
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
86727
23.2%
o 25933
 
6.9%
r 23427
 
6.3%
e 21408
 
5.7%
, 20328
 
5.4%
a 17841
 
4.8%
t 17433
 
4.7%
i 17028
 
4.6%
s 16046
 
4.3%
n 15875
 
4.2%
Other values (27) 111656
29.9%

Correlations

2024-11-05T10:03:46.719120image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Approximately how many individuals are responsible for data science workloads at your place of business?Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years?Does your current employer incorporate machine learning methods into their business?For how many years have you used machine learning methods?Have you ever used a TPU (tensor processing unit)?How long have you been writing code to analyze data (at work or at school)?Select the title most similar to your current role (or most recent title if retired)What is the highest level of formal education that you have attained or plan to attain within the next 2 years?What is the size of the company where you are employed?What is your age (# years)?What is your current yearly compensation (approximate $USD)?What is your gender?What programming language would you recommend an aspiring data scientist to learn first?Which types of specialized hardware do you use on a regular basis?
Approximately how many individuals are responsible for data science workloads at your place of business?1.0000.1500.2440.1120.0670.1230.1120.0540.3010.0330.1040.0230.0300.036
Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years?0.1501.0000.1670.1570.1130.1440.0870.0450.1020.0960.1960.0490.0320.088
Does your current employer incorporate machine learning methods into their business?0.2440.1671.0000.2020.0980.1430.1620.0620.1200.0500.1370.0320.0300.098
For how many years have you used machine learning methods?0.1120.1570.2021.0000.1330.4640.1540.1480.0360.1600.1650.0650.0630.113
Have you ever used a TPU (tensor processing unit)?0.0670.1130.0980.1331.0000.0740.0870.0400.0510.0820.0500.0370.0690.411
How long have you been writing code to analyze data (at work or at school)?0.1230.1440.1430.4640.0741.0000.1450.1490.0580.2820.2280.0500.0680.077
Select the title most similar to your current role (or most recent title if retired)0.1120.0870.1620.1540.0870.1451.0000.1800.0520.0840.0680.0860.1350.085
What is the highest level of formal education that you have attained or plan to attain within the next 2 years?0.0540.0450.0620.1480.0400.1490.1801.0000.0600.1520.0860.0520.0540.037
What is the size of the company where you are employed?0.3010.1020.1200.0360.0510.0580.0520.0601.0000.0710.1370.0300.0410.040
What is your age (# years)?0.0330.0960.0500.1600.0820.2820.0840.1520.0711.0000.1480.0640.0870.034
What is your current yearly compensation (approximate $USD)?0.1040.1960.1370.1650.0500.2280.0680.0860.1370.1481.0000.0770.0570.044
What is your gender?0.0230.0490.0320.0650.0370.0500.0860.0520.0300.0640.0771.0000.0400.163
What programming language would you recommend an aspiring data scientist to learn first?0.0300.0320.0300.0630.0690.0680.1350.0540.0410.0870.0570.0401.0000.111
Which types of specialized hardware do you use on a regular basis?0.0360.0880.0980.1130.4110.0770.0850.0370.0400.0340.0440.1630.1111.000

Missing values

2024-11-05T10:03:13.569469image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
A simple visualization of nullity by column.
2024-11-05T10:03:15.173248image/svg+xmlMatplotlib v3.7.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

What is your age (# years)?What is your gender?In which country do you currently reside?What is the highest level of formal education that you have attained or plan to attain within the next 2 years?Select the title most similar to your current role (or most recent title if retired)What is the size of the company where you are employed?Approximately how many individuals are responsible for data science workloads at your place of business?Does your current employer incorporate machine learning methods into their business?What is your current yearly compensation (approximate $USD)?Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years?What is the primary tool that you use at work or school to analyze data?How long have you been writing code to analyze data (at work or at school)?What programming language would you recommend an aspiring data scientist to learn first?Have you ever used a TPU (tensor processing unit)?For how many years have you used machine learning methods?Who/what are your favorite media sources that report on data science topics?On which platforms have you begun or completed data science courses?Which of the following integrated development environments (IDE's) do you use on a regular basis?Which of the following hosted notebook products do you use on a regular basis?What programming languages do you use on a regular basis?What data visualization libraries or tools do you use on a regular basis?Which types of specialized hardware do you use on a regular basis?Which of the following ML algorithms do you use on a regular basis?Which categories of ML tools do you use on a regular basis?Which of the following machine learning frameworks do you use on a regular basis?
022-24MaleFranceMaster’s degreeSoftware Engineer1000-9,999 employees0I do not know30,000-39,999$0 (USD)Basic statistical software (Microsoft Excel, Google Sheets, etc.), 0, -1, -1, -1, -11-2 yearsPythonNever1-2 yearsTwitter (data science influencers), Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc)Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), UdemyJupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , MATLAB , SpyderNonePython, R, SQL, Java, Javascript, MATLABMatplotlibCPUs, GPUsLinear or Logistic RegressionNoneNone
140-44MaleAustraliaMaster’s degreeOther> 10,000 employees20+I do not know250,000-299,999$10,000-$99,999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 0, -11-2 yearsPythonUsed TPU2-3 yearsPodcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc), Slack Communities (ods.ai, kagglenoobs, etc)Coursera, edX, DataCamp, University Courses (resulting in a university degree)Jupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio CodeMicrosoft Azure NotebooksPython, R, SQL, BashGgplot / ggplot2 , Matplotlib , SeabornCPUs, GPUsLinear or Logistic Regression, Convolutional Neural NetworksAutomation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)Scikit-learn , TensorFlow , Keras , RandomForest
222-24MaleIndiaBachelor’s degreeOther0-49 employees0No (we do not use ML methods)4,000-4,999$0 (USD)Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 1, -1< 1 yearsPythonNever< 1 yearsYouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), OtherOtherJupyter (JupyterLab, Jupyter Notebooks, etc)Google Colab , Google Cloud Notebook Products (AI Platform, Datalab, etc)Python, SQLMatplotlib , Plotly / Plotly Express , SeabornCPUs, GPUsLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc)NoneScikit-learn , RandomForest, Xgboost , LightGBM
350-54MaleFranceMaster’s degreeData Scientist0-49 employees3-4We have well established ML methods (i.e., models in production for more than 2 years)60,000-69,999$10,000-$99,999Advanced statistical software (SPSS, SAS, etc.), -1, 0, -1, -1, -120+ yearsOtherNever10-15 yearsYouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc)NoneRStudio , OtherNonePython, RGgplot / ggplot2CPUs, GPUsLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Convolutional Neural Networks, Generative Adversarial Networks, Recurrent Neural NetworksAutomated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)Scikit-learn , TensorFlow , Keras , RandomForest, Xgboost , Caret
422-24MaleIndiaMaster’s degreeData Scientist50-249 employees20+We are exploring ML methods (and may one day put a model into production)10,000-14,999$100-$999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 2, -13-5 yearsPythonUsed TPU2-3 yearsKaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Journal Publications (traditional publications, preprint journals, etc)Udacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), UdemyJupyter (JupyterLab, Jupyter Notebooks, etc) , Spyder , Notepad++ , Sublime TextKaggle Notebooks (Kernels) , Google Colab , Binder / JupyterHubPython, R, BashMatplotlib , Plotly / Plotly Express , Bokeh , SeabornCPUs, GPUsLinear or Logistic Regression, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks, Recurrent Neural NetworksAutomated data augmentation (e.g. imgaug, albumentations), Automated feature engineering/selection (e.g. tpot, boruta_py), Automated model selection (e.g. auto-sklearn, xcessiv), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)Scikit-learn , TensorFlow , Keras , PyTorch
522-24FemaleUnited States of AmericaBachelor’s degreeData Scientist> 10,000 employees20+We recently started using ML methods (i.e., models in production for less than 2 years)80,000-89,999$0 (USD)Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 3, -13-5 yearsPythonUsed TPU3-4 yearsHacker News (https://news.ycombinator.com/), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc)Udemy, University Courses (resulting in a university degree)Jupyter (JupyterLab, Jupyter Notebooks, etc) , SpyderMicrosoft Azure Notebooks , AWS Notebook Products (EMR Notebooks, Sagemaker Notebooks, etc)PythonMatplotlib , Plotly / Plotly ExpressCPUsLinear or Logistic Regression, Decision Trees or Random Forests, Convolutional Neural NetworksNoneScikit-learn , TensorFlow , Keras , Spark MLib
655-59MaleNetherlandsMaster’s degreeOther0-49 employees1-2We are exploring ML methods (and may one day put a model into production)$0-999$100-$999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 5, -15-10 yearsPythonNever< 1 yearsKaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc)CourseraJupyter (JupyterLab, Jupyter Notebooks, etc)NonePython, SQLMatplotlib , D3.js , SeabornCPUsLinear or Logistic Regression, Bayesian Approaches, Generative Adversarial NetworksNoneScikit-learn , PyTorch
730-34MaleGermanyMaster’s degreeStatistician0-49 employees5-9We recently started using ML methods (i.e., models in production for less than 2 years)2,000-2,999$1000-$9,999Basic statistical software (Microsoft Excel, Google Sheets, etc.), 2, -1, -1, -1, -15-10 yearsRUsed TPU4-5 yearsPodcasts (Chai Time Data Science, Linear Digressions, etc)CourseraJupyter (JupyterLab, Jupyter Notebooks, etc)Code OceanRMatplotlibCPUsBayesian ApproachesAutomated data augmentation (e.g. imgaug, albumentations)Scikit-learn
830-34MaleGermanyBachelor’s degreeData Scientist50-249 employees5-9We recently started using ML methods (i.e., models in production for less than 2 years)70,000-79,999$1000-$9,999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 6, -15-10 yearsRNever4-5 yearsNoneedXJupyter (JupyterLab, Jupyter Notebooks, etc) , RStudioNonePython, RGgplot / ggplot2CPUsLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Dense Neural Networks (MLPs, etc)NoneKeras , Caret
930-34MaleUnited States of AmericaMaster’s degreeProduct/Project Manager> 10,000 employees20+I do not know90,000-99,999$0 (USD)Basic statistical software (Microsoft Excel, Google Sheets, etc.), 1, -1, -1, -1, -13-5 yearsPythonNever2-3 yearsHacker News (https://news.ycombinator.com/), Reddit (r/machinelearning, r/datascience, etc), Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc)Udacity, Coursera, DataQuest, Kaggle Courses (i.e. Kaggle Learn), Fast.ai, Udemy, University Courses (resulting in a university degree)Jupyter (JupyterLab, Jupyter Notebooks, etc) , PyCharm , Atom , Notepad++ , Sublime TextKaggle Notebooks (Kernels) , Google Colab , Google Cloud Notebook Products (AI Platform, Datalab, etc) , Code OceanPythonMatplotlib , Plotly / Plotly Express , SeabornNone / I do not knowLinear or Logistic Regression, Decision Trees or Random Forests, Bayesian ApproachesNoneScikit-learn , RandomForest
What is your age (# years)?What is your gender?In which country do you currently reside?What is the highest level of formal education that you have attained or plan to attain within the next 2 years?Select the title most similar to your current role (or most recent title if retired)What is the size of the company where you are employed?Approximately how many individuals are responsible for data science workloads at your place of business?Does your current employer incorporate machine learning methods into their business?What is your current yearly compensation (approximate $USD)?Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years?What is the primary tool that you use at work or school to analyze data?How long have you been writing code to analyze data (at work or at school)?What programming language would you recommend an aspiring data scientist to learn first?Have you ever used a TPU (tensor processing unit)?For how many years have you used machine learning methods?Who/what are your favorite media sources that report on data science topics?On which platforms have you begun or completed data science courses?Which of the following integrated development environments (IDE's) do you use on a regular basis?Which of the following hosted notebook products do you use on a regular basis?What programming languages do you use on a regular basis?What data visualization libraries or tools do you use on a regular basis?Which types of specialized hardware do you use on a regular basis?Which of the following ML algorithms do you use on a regular basis?Which categories of ML tools do you use on a regular basis?Which of the following machine learning frameworks do you use on a regular basis?
1021650-54MaleFranceSome college/university study without earning a bachelor’s degreeData Scientist0-49 employees3-4We use ML methods for generating insights (but do not put working models into production)100,000-124,999$10,000-$99,999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 125, -15-10 yearsPythonUsed TPU4-5 yearsTwitter (data science influencers), Hacker News (https://news.ycombinator.com/), Reddit (r/machinelearning, r/datascience, etc), Kaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc), Slack Communities (ods.ai, kagglenoobs, etc)Udacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), University Courses (resulting in a university degree), OtherJupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , Visual Studio / Visual Studio CodeKaggle Notebooks (Kernels) , Google Colab , Microsoft Azure Notebooks , Binder / JupyterHubPython, SQL, C++Matplotlib , Shiny , Plotly / Plotly ExpressTPUsLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Evolutionary Approaches, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks, Generative Adversarial Networks, Recurrent Neural Networks, Transformer Networks (BERT, gpt-2, etc)Automated data augmentation (e.g. imgaug, albumentations), Automated feature engineering/selection (e.g. tpot, boruta_py), Automated model selection (e.g. auto-sklearn, xcessiv)Scikit-learn , TensorFlow , PyTorch , Spark MLib
1021725-29MaleNigeriaDoctoral degreeData Scientist250-999 employees1-2We are exploring ML methods (and may one day put a model into production)1,000-1,999$100-$999Business intelligence software (Salesforce, Tableau, Spotfire, etc.), -1, -1, 337, -1, -1< 1 yearsPythonUsed TPU< 1 yearsReddit (r/machinelearning, r/datascience, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc)Udacity, edX, UdemyVisual Studio / Visual Studio CodeMicrosoft Azure NotebooksPython, R, SQLGgplot / ggplot2GPUsBayesian ApproachesAutomated data augmentation (e.g. imgaug, albumentations), Automated hyperparameter tuning (e.g. hyperopt, ray.tune), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)Fast.ai
1021818-21MaleNigeriaBachelor’s degreeData Analyst250-999 employees5-9I do not know5,000-7,499$1000-$9,999Advanced statistical software (SPSS, SAS, etc.), -1, 253, -1, -1, -1< 1 yearsRNever< 1 yearsHacker News (https://news.ycombinator.com/), Kaggle (forums, blog, social media, etc)DataCampRStudioNonePython, RGgplot / ggplot2CPUsLinear or Logistic RegressionAutomated feature engineering/selection (e.g. tpot, boruta_py)None
1021935-39MaleSaudi ArabiaMaster’s degreeData Scientist> 10,000 employees5-9We have well established ML methods (i.e., models in production for more than 2 years)100,000-124,999$10,000-$99,999Other, -1, -1, -1, -1, -15-10 yearsRNever10-15 yearsYouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Slack Communities (ods.ai, kagglenoobs, etc)Coursera, DataCampRStudioNoneR, SQLGgplot / ggplot2 , Shiny , Plotly / Plotly Express , Leaflet / FoliumCPUsGradient Boosting Machines (xgboost, lightgbm, etc)NoneXgboost , Caret
1022025-29MaleViet NamMaster’s degreeData Analyst50-249 employees1-2We are exploring ML methods (and may one day put a model into production)$0-999$1-$99Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 50, -11-2 yearsPythonNever1-2 yearsKaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc)Udacity, Coursera, Kaggle Courses (i.e. Kaggle Learn), Fast.ai, Udemy, LinkedIn LearningJupyter (JupyterLab, Jupyter Notebooks, etc) , Sublime TextKaggle Notebooks (Kernels) , Google ColabPythonMatplotlib , SeabornCPUsLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian ApproachesAutomated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune)Scikit-learn , Xgboost
1022125-29MaleIndiaMaster’s degreeData Scientist0-49 employees1-2We recently started using ML methods (i.e., models in production for less than 2 years)1,000-1,999$100-$999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 2838, -13-5 yearsPythonNever2-3 yearsHacker News (https://news.ycombinator.com/), Kaggle (forums, blog, social media, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Slack Communities (ods.ai, kagglenoobs, etc)Kaggle Courses (i.e. Kaggle Learn), LinkedIn LearningJupyter (JupyterLab, Jupyter Notebooks, etc) , PyCharm , MATLAB , Notepad++Google Cloud Notebook Products (AI Platform, Datalab, etc) , AWS Notebook Products (EMR Notebooks, Sagemaker Notebooks, etc)Python, MATLABMatplotlibCPUs, GPUsLinear or Logistic Regression, Decision Trees or Random Forests, Convolutional Neural NetworksAutomated data augmentation (e.g. imgaug, albumentations)Scikit-learn , TensorFlow , PyTorch , Spark MLib
1022222-24FemaleOtherBachelor’s degreeOther50-249 employees1-2We are exploring ML methods (and may one day put a model into production)5,000-7,499$100-$999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 0, -11-2 yearsPythonNever1-2 yearsOtherUdacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), University Courses (resulting in a university degree), OtherJupyter (JupyterLab, Jupyter Notebooks, etc) , Atom , Visual Studio / Visual Studio Code , SpyderGoogle ColabPythonMatplotlib , SeabornCPUs, GPUsLinear or Logistic Regression, Decision Trees or Random Forests, Dense Neural Networks (MLPs, etc), Convolutional Neural NetworksAutomated hyperparameter tuning (e.g. hyperopt, ray.tune)Scikit-learn , TensorFlow , PyTorch
1022325-29MaleChinaI prefer not to answerData Engineer250-999 employees5-9We recently started using ML methods (i.e., models in production for less than 2 years)20,000-24,999$100-$999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 12, -11-2 yearsPythonUsed TPU1-2 yearsOtherKaggle Courses (i.e. Kaggle Learn)Jupyter (JupyterLab, Jupyter Notebooks, etc) , PyCharmGoogle ColabPythonSeabornGPUsDense Neural Networks (MLPs, etc), Recurrent Neural NetworksNoneScikit-learn , TensorFlow , Keras
1022425-29MaleAustraliaBachelor’s degreeOther1000-9,999 employees5-9No (we do not use ML methods)60,000-69,999$10,000-$99,999Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 14, -13-5 yearsPythonNever1-2 yearsHacker News (https://news.ycombinator.com/), Reddit (r/machinelearning, r/datascience, etc), Kaggle (forums, blog, social media, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc)Coursera, edX, Fast.ai, UdemyJupyter (JupyterLab, Jupyter Notebooks, etc) , MATLAB , Visual Studio / Visual Studio CodeNonePython, SQL, MATLABMatplotlib , Plotly / Plotly Express , Bokeh , SeabornCPUs, GPUsLinear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian ApproachesNoneScikit-learn , TensorFlow , PyTorch
1022550-54MaleFranceBachelor’s degreeSoftware Engineer> 10,000 employees20+We have well established ML methods (i.e., models in production for more than 2 years)60,000-69,999$0 (USD)Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 25, -13-5 yearsPythonNever4-5 yearsBlogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc)Coursera, edX, UdemyJupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio CodeIBM Watson StudioPython, SQL, Java, BashMatplotlibCPUsLinear or Logistic Regression, Decision Trees or Random ForestsAutomated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune)Scikit-learn , Spark MLib